Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

نویسندگان

  • Fengguang Song
  • Stanimire Tomov
  • Jack Dongarra
چکیده

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve four objectives: a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine, and to use a heterogeneous 1-D block cyclic distribution to allocate data to the host system and GPUs to minimize communication. We have developed heterogeneous rectangular-tile algorithms with two different tile sizes (one for CPU cores and the other for GPUs) to cope with processor heterogeneity. We also propose an auto-tuning method to determine the best tile sizes to attain both high performance and load balancing. We have implemented a new runtime system and applied it to the rectangular tile Cholesky and QR factorizations. Our experiments on a compute node with two Intel Westmere hexa-core CPUs and three Nvidia Fermi GPUs demonstrate the weak scalability, strong scalability, load balance, and efficiency of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and multi-GPU systems to support dense matrix computations efficiently. The main idea is that we treat a heterogeneous system as a distributedmemory machine, and use a heterogeneous multi-level block cyclic distribution method to allocate data to the host and multiple GPUs to minimize communication. We ...

متن کامل

Matrix Multiplication on High-Density Multi-GPU Architectures: Theoretical and Experimental Investigations

Matrix multiplication (MM) is one of the core problems in the high performance computing domain and its efficiency impacts performances of almost all matrix problems. The high-density multi-GPU architecture escalates the complexities of such classical problem, though it greatly exceeds the capacities of previous homogeneous multicore architectures. In order to fully exploit the potential of suc...

متن کامل

Multigrid for Matrix-Free Finite Element Computations on Graphics Processors

In this paper, we consider matrix-free finite-element techniques for efficient numerical solution of partial differential equations on modern manycore processors such as graphics cards. We present a GPU parallelization of a completely matrix-free geometric multigrid iterative solver, with support for general curved and adaptively refined meshes with hanging nodes. Comparing our implementation r...

متن کامل

Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms

High performance computing (HPC) presents a technology that allows solving high intensive problems in a reasonable period of time, and can offer many advantages for large applications in various fields of science and industry. Current multi-core processors, especially graphic processing units (GPUs), have quickly evolved to become efficient accelerators for data parallel computing. They can mai...

متن کامل

Parallel Algorithms for Constructing Data Structures for Fast Multipole Methods

We present efficient algorithms to build data structures and the lists needed for fast multipole methods. The algorithms are capable of being efficiently implemented on both serial, data parallel GPU and on distributed architectures. With these algorithms it is possible to map the FMM efficiently on to the GPU or distributed heterogeneous CPU-GPU systems. Further, in dynamic problems, as the di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011